43 research outputs found
Recommended from our members
Bayesian Learning for Data-Efficient Control
Applications to learn control of unfamiliar dynamical systems with increasing autonomy are ubiquitous. From robotics, to finance, to industrial processing, autonomous learning helps obviate a heavy reliance on experts for system identification and controller design. Often real world systems are nonlinear, stochastic, and expensive to operate (e.g. slow, energy intensive, prone to wear and tear). Ideally therefore, nonlinear systems can be identified with minimal system interaction. This thesis considers data efficient autonomous learning of control of nonlinear, stochastic systems. Data efficient learning critically requires probabilistic modelling of dynamics. Traditional control approaches use deterministic models, which easily overfit data, especially small datasets. We use probabilistic Bayesian modelling to learn systems from scratch, similar to the PILCO algorithm, which achieved unprecedented data efficiency in learning control of several benchmarks. We extend PILCO in three principle ways. First, we learn control under significant observation noise by simulating a filtered control process using a tractably analytic framework of Gaussian distributions. In addition, we develop the ‘latent variable belief Markov decision process’ when filters must predict under real-time constraints. Second, we improve PILCO’s data efficiency by directing exploration with predictive loss uncertainty and Bayesian optimisation, including a novel approximation to the Gittins index. Third, we take a step towards data efficient learning of high-dimensional control using Bayesian neural networks (BNN). Experimentally we show although filtering mitigates adverse effects of observation noise, much greater performance is achieved when optimising controllers with evaluations faithful to reality: by simulating closed-loop filtered control if executing closed-loop filtered control. Thus, controllers are optimised w.r.t. how they are used, outperforming filters applied to systems optimised by unfiltered simulations. We show directed exploration improves data efficiency. Lastly, we show BNN dynamics models are almost as data efficient as Gaussian process models. Results show data efficient learning of high-dimensional control is possible as BNNs scale to high-dimensional state inputs
RAP: Risk-Aware Prediction for Robust Planning
Robust planning in interactive scenarios requires predicting the uncertain
future to make risk-aware decisions. Unfortunately, due to long-tail
safety-critical events, the risk is often under-estimated by finite-sampling
approximations of probabilistic motion forecasts. This can lead to
overconfident and unsafe robot behavior, even with robust planners. Instead of
assuming full prediction coverage that robust planners require, we propose to
make prediction itself risk-aware. We introduce a new prediction objective to
learn a risk-biased distribution over trajectories, so that risk evaluation
simplifies to an expected cost estimation under this biased distribution. This
reduces the sample complexity of the risk estimation during online planning,
which is needed for safe real-time performance. Evaluation results in a
didactic simulation environment and on a real-world dataset demonstrate the
effectiveness of our approach.Comment: 23 pages, 14 figures, 3 tables. First two authors contributed
equally. Conference on Robot Learning (CoRL) 2022 (oral
Outcome-Driven Reinforcement Learning via Variational Inference
While reinforcement learning algorithms provide automated acquisition of
optimal policies, practical application of such methods requires a number of
design decisions, such as manually designing reward functions that not only
define the task, but also provide sufficient shaping to accomplish it. In this
paper, we view reinforcement learning as inferring policies that achieve
desired outcomes, rather than as a problem of maximizing rewards. To solve this
inference problem, we establish a novel variational inference formulation that
allows us to derive a well-shaped reward function which can be learned directly
from environment interactions. From the corresponding variational objective, we
also derive a new probabilistic Bellman backup operator and use it to develop
an off-policy algorithm to solve goal-directed tasks. We empirically
demonstrate that this method eliminates the need to hand-craft reward functions
for a suite of diverse manipulation and locomotion tasks and leads to effective
goal-directed behaviors.Comment: Published in Advances in Neural Information Processing Systems 34
(NeurIPS 2021